Functionality to Identify and Assign New Aliases#11
Open
jmcbroome wants to merge 14 commits intocorneliusroemer:mainfrom
Open
Functionality to Identify and Assign New Aliases#11jmcbroome wants to merge 14 commits intocorneliusroemer:mainfrom
jmcbroome wants to merge 14 commits intocorneliusroemer:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed Changes
This pull request includes some additional functionality I wrote for identifying and assigning the next available alias code to arbitrary lineages in the course of working on my automated lineage designation pipeline. You might appreciate its addition to your package to assist in your own designation workflows, as well as for other users, though I would understand if you feel this is outside the scope of this particular tool or have concerns about these methods causing confusion for users who are not interested in designating lineages.
In terms of implementation, it works by converting the Pango aliases into base26 numbers, finding the maximum, and incrementing it by 1 to find the next available alias. It handles banned values (I, O, and X) by incrementing the characters past these when returning alias strings. Recombinant lineages (prefixed with X) are tracked as a separate group, but the same functions are available when the appropriate parameter is set.
It includes two new methods and a small number of hidden helper functions:
Additionally, it adds a new parameter to compress(), which when True automatically assigns a new alias string in the case of a fourth suffix level with no accepted alias. The default behavior matches the current behavior (raises an error for unhandled fourth suffix levels).
It's worth noting that I did not write code to automatically export an updated alias_key.json, mostly because information about the alias_key.json is lost on loading as you do not store the multiple recombinant parent lineages, and therefore a JSON rebuilt from the attributes of the Aliasor() object would be incomplete. This could be the subject of a future update.
I have followed the guidelines posted here and here in developing and testing this code. Please let me know if I missed any additional rules I missed, if there are unhandled cases I am not covering, or if you notice any other problems with these changes.
Testing
I've updated the tests with